install.packages("dplyr")
Error in install.packages : Updating loaded packages
install.packages("plotly")
Error in install.packages : Updating loaded packages
install.packages("rmarkdown")
Error in install.packages : Updating loaded packages
library(dplyr)
library(plotly)
library(rmarkdown)
For the Exam, we will use the Import Dataset menu on the right instead of uploading from within the notebook.
Let’s continue to work with Exercise 1.21
y<-CH01PR21$V1 #This code renames the variables. Note that variables are labeled "V1" not "X1"
x<-CH01PR21$V2
The following code executes a simple linear regression analysis from model fitting to anova output to interactive scatterplot with line of best fit. See comments below.
Fit the model:
fit1.21<-lm(y~x)
Scatterplot:
# Create the scatterplot with a line of best fit
scatter_plot <- plot_ly(data=CH01PR21, x = ~x, y = ~y, mode = "markers", type = "scatter", name = "Data") %>%
add_trace(mode = "lines", type = "scatter", line = list(color = "red"), y = lm(y~x,data=CH01PR21)$fitted.values, name = "Line of Best Fit") %>%
layout(title = "Scatterplot with Line of Best Fit",
xaxis = list(title = "X-Axis Label"),
yaxis = list(title = "Y-Axis Label"))
# Print the plot
print(scatter_plot)
NULL
Model summary with parameter estimates, standard errors, test statistics, p-values and more Exercise 2.6b
summary(fit1.21)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.2 -1.2 0.3 0.8 1.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.2000 0.6633 15.377 3.18e-07 ***
x 4.0000 0.4690 8.528 2.75e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.483 on 8 degrees of freedom
Multiple R-squared: 0.9009, Adjusted R-squared: 0.8885
F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05
Parameter Estimation Exercise 2.6a
confint(fit1.21,level=0.95)
2.5 % 97.5 %
(Intercept) 8.670370 11.729630
x 2.918388 5.081612
Let’s verify
print("Lower limit")
[1] "Lower limit"
fit1.21$coefficients[2]-qt(.025,8,lower.tail=F)*((sum(fit1.21$residuals^2)/8)/(sum((x-mean(x))^2)))^.5
x
2.918388
print("Upper limit")
[1] "Upper limit"
fit1.21$coefficients[2]+qt(.025,8,lower.tail=F)*((sum(fit1.21$residuals^2)/8)/(sum((x-mean(x))^2)))^.5
x
5.081612
fit1.21$residuals
1 2 3 4 5 6 7 8 9 10
1.8 -1.2 -1.2 1.8 -0.2 -1.2 -2.2 0.8 0.8 0.8
Confidence intervals for mean response and prediction intervals for future independent responses. Exercise 2.15ab
#Create a data frame with new explanatory values
newdata<-data.frame(x=c(2,4))
print("Fitted value, lower and upper bounds for confidence interval")
[1] "Fitted value, lower and upper bounds for confidence interval"
predict(fit1.21,newdata,interval="confidence",level = .99)
fit lwr upr
1 18.2 15.97429 20.42571
2 26.2 21.22316 31.17684
#The two rows of out correspond to x=2 and x=4 respectively
print("Fitted value, lower and upper bounds for prediction interval")
[1] "Fitted value, lower and upper bounds for prediction interval"
predict(fit1.21,newdata,interval="predict",level = .99)
fit lwr upr
1 18.2 12.74814 23.65186
2 26.2 19.16168 33.23832
summary(fit1.21)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.2 -1.2 0.3 0.8 1.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.2000 0.6633 15.377 3.18e-07 ***
x 4.0000 0.4690 8.528 2.75e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.483 on 8 degrees of freedom
Multiple R-squared: 0.9009, Adjusted R-squared: 0.8885
F-statistic: 72.73 on 1 and 8 DF, p-value: 2.749e-05
Confidence band
pf121 = predict(fit1.21,newdata)
pf121
1 2
18.2 26.2
n<-dim(CH01PR21)[1]
alpha<-0.01
xh<-2
W<-(2*qf(1-alpha,2,n-2))^0.5
MSE<-sum(fit1.21$residuals^2)/(n-2)
kxx<-sum((x-mean(x))^2)
print("Lower limit")
[1] "Lower limit"
pf121[1]-W*(MSE*(1/n+(xh-mean(x))^2/kxx))^.5
1
15.44116
print("Upper limit")
[1] "Upper limit"
pf121[1]+W*(MSE*(1/n+(xh-mean(x))^2/kxx))^.5
1
20.95884
n<-dim(CH01PR21)[1]
alpha<-0.01
xh<-4
W<-(2*qf(1-alpha,2,n-2))^0.5
MSE<-sum(fit1.21$residuals^2)/(n-2)
kxx<-sum((x-mean(x))^2)
print("Lower limit")
[1] "Lower limit"
pf121[2]-W*(MSE*(1/n+(xh-mean(x))^2/kxx))^.5
2
20.03104
print("Upper limit")
[1] "Upper limit"
pf121[2]+W*(MSE*(1/n+(xh-mean(x))^2/kxx))^.5
2
32.36896
(MSE*(1/n+(xh-mean(x))^2/kxx))
[1] 2.2
Anova table
anova(fit1.21)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 160.0 160.0 72.727 2.749e-05 ***
Residuals 8 17.6 2.2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As done above, first import the data using the menu on the right. Then assign variables:
y<-CH06PR18$V1 #This code renames the variables
x1<-CH06PR18$V2
x2<-CH06PR18$V3
x3<-CH06PR18$V4
x4<-CH06PR18$V5
Fit the Multiple Linear Regression (MLR) model:
fit6.18<-lm(y~x1+x2+x3+x4)
The Big Picture Model Summary Model summary with parameter estimates, standard errors, test statistics, p-values and more
summary(fit6.18)
Call:
lm(formula = y ~ x1 + x2 + x3 + x4)
Residuals:
Min 1Q Median 3Q Max
-3.1872 -0.5911 -0.0910 0.5579 2.9441
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.220e+01 5.780e-01 21.110 < 2e-16 ***
x1 -1.420e-01 2.134e-02 -6.655 3.89e-09 ***
x2 2.820e-01 6.317e-02 4.464 2.75e-05 ***
x3 6.193e-01 1.087e+00 0.570 0.57
x4 7.924e-06 1.385e-06 5.722 1.98e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.137 on 76 degrees of freedom
Multiple R-squared: 0.5847, Adjusted R-squared: 0.5629
F-statistic: 26.76 on 4 and 76 DF, p-value: 7.272e-14